Goto

Collaborating Authors

 Cheshire


Comprehending Spatio-temporal Data via Cinematic Storytelling using Large Language Models

Shang, Panos Kalnis. Shuo, Jensen, Christian S.

arXiv.org Artificial Intelligence

Spatio-temporal data captures complex dynamics across both space and time, yet traditional visualizations are complex, require domain expertise and often fail to resonate with broader audiences. Here, we propose MapMuse, a storytelling-based framework for interpreting spatio-temporal datasets, transforming them into compelling, narrative-driven experiences. We utilize large language models and employ retrieval augmented generation (RAG) and agent-based techniques to generate comprehensive stories. Drawing on principles common in cinematic storytelling, we emphasize clarity, emotional connection, and audience-centric design. As a case study, we analyze a dataset of taxi trajectories. Two perspectives are presented: a captivating story based on a heat map that visualizes millions of taxi trip endpoints to uncover urban mobility patterns; and a detailed narrative following a single long taxi journey, enriched with city landmarks and temporal shifts. By portraying locations as characters and movement as plot, we argue that data storytelling drives insight, engagement, and action from spatio-temporal information. The case study illustrates how MapMuse can bridge the gap between data complexity and human understanding. The aim of this short paper is to provide a glimpse to the potential of the cinematic storytelling technique as an effective communication tool for spatio-temporal data, as well as to describe open problems and opportunities for future research.


Paper2Web: Let's Make Your Paper Alive!

Chen, Yuhang, Lv, Tianpeng, Zhang, Siyi, Yin, Yixiang, Wan, Yao, Yu, Philip S., Chen, Dongping

arXiv.org Artificial Intelligence

Academic project websites can more effectively disseminate research when they clearly present core content and enable intuitive navigation and interaction. However, current approaches such as direct Large Language Model (LLM) generation, templates, or direct HTML conversion struggle to produce layout-aware, interactive sites, and a comprehensive evaluation suite for this task has been lacking. In this paper, we introduce Paper2Web, a benchmark dataset and multi-dimensional evaluation framework for assessing academic webpage generation. It incorporates rule-based metrics like Connectivity, Completeness and human-verified LLM-as-a-Judge (covering interactivity, aesthetics, and informativeness), and PaperQuiz, which measures paper-level knowledge retention. We further present PWAgent, an autonomous pipeline that converts scientific papers into interactive and multimedia-rich academic homepages. The agent iteratively refines both content and layout through MCP tools that enhance emphasis, balance, and presentation quality. Our experiments show that PWAgent consistently outperforms end-to-end baselines like template-based webpages and arXiv/alphaXiv versions by a large margin while maintaining low cost, achieving the Pareto-front in academic webpage generation.


Evaluating Compliance with Visualization Guidelines in Diagrams for Scientific Publications Using Large Vision Language Models

Rückert, Johannes, Bloch, Louise, Friedrich, Christoph M.

arXiv.org Artificial Intelligence

Diagrams are widely used to visualize data in publications. The research field of data visualization deals with defining principles and guidelines for the creation and use of these diagrams, which are often not known or adhered to by researchers, leading to misinformation caused by providing inaccurate or incomplete information. In this work, large Vision Language Models (VLMs) are used to analyze diagrams in order to identify potential problems in regards to selected data visualization principles and guidelines. To determine the suitability of VLMs for these tasks, five open source VLMs and five prompting strategies are compared using a set of questions derived from selected data visualization guidelines. The results show that the employed VLMs work well to accurately analyze diagram types (F1-score 82.49 %), 3D effects (F1-score 98.55 %), axes labels (F1-score 76.74 %), lines (RMSE 1.16), colors (RMSE 1.60) and legends (F1-score 96.64 %, RMSE 0.70), while they cannot reliably provide feedback about the image quality (F1-score 0.74 %) and tick marks/labels (F1-score 46.13 %). Among the employed VLMs, Qwen2.5VL performs best, and the summarizing prompting strategy performs best for most of the experimental questions. It is shown that VLMs can be used to automatically identify a number of potential issues in diagrams, such as missing axes labels, missing legends, and unnecessary 3D effects. The approach laid out in this work can be extended for further aspects of data visualization.


OnGoal: Tracking and Visualizing Conversational Goals in Multi-Turn Dialogue with Large Language Models

Coscia, Adam, Guo, Shunan, Koh, Eunyee, Endert, Alex

arXiv.org Artificial Intelligence

As multi-turn dialogues with large language models (LLMs) grow longer and more complex, how can users better evaluate and review progress on their conversational goals? We present OnGoal, an LLM chat interface that helps users better manage goal progress. OnGoal provides real-time feedback on goal alignment through LLM-assisted evaluation, explanations for evaluation results with examples, and overviews of goal progression over time, enabling users to navigate complex dialogues more effectively. Through a study with 20 participants on a writing task, we evaluate OnGoal against a baseline chat interface without goal tracking. Using OnGoal, participants spent less time and effort to achieve their goals while exploring new prompting strategies to overcome miscommunication, suggesting tracking and visualizing goals can enhance engagement and resilience in LLM dialogues. Our findings inspired design implications for future LLM chat interfaces that improve goal communication, reduce cognitive load, enhance interactivity, and enable feedback to improve LLM performance.


MapStory: Prototyping Editable Map Animations with LLM Agents

Gunturu, Aditya, Pearman, Ben, Ihara, Keiichi, Faraji, Morteza, Wang, Bryan, Kazi, Rubaiat Habib, Suzuki, Ryo

arXiv.org Artificial Intelligence

We introduce MapStory, an LLM-powered animation prototyping tool that generates editable map animation sequences directly from natural language text by leveraging a dual-agent LLM architecture. Given a user written script, MapStory automatically produces a scene breakdown, which decomposes the text into key map animation primitives such as camera movements, visual highlights, and animated elements. Our system includes a researcher agent that accurately queries geospatial information by leveraging an LLM with web search, enabling automatic extraction of relevant regions, paths, and coordinates while allowing users to edit and query for changes or additional information to refine the results. Additionally, users can fine-tune parameters of these primitive blocks through an interactive timeline editor. We detail the system's design and architecture, informed by formative interviews with professional animators and by an analysis of 200 existing map animation videos. Our evaluation, which includes expert interviews (N=5) and a usability study (N=12), demonstrates that MapStory enables users to create map animations with ease, facilitates faster iteration, encourages creative exploration, and lowers barriers to creating map-centric stories.


CHART-6: Human-Centered Evaluation of Data Visualization Understanding in Vision-Language Models

Verma, Arnav, Mukherjee, Kushin, Potts, Christopher, Kreiss, Elisa, Fan, Judith E.

arXiv.org Artificial Intelligence

Data visualizations are powerful tools for communicating patterns in quantitative data. Yet understanding any data visualization is no small feat -- succeeding requires jointly making sense of visual, numerical, and linguistic inputs arranged in a conventionalized format one has previously learned to parse. Recently developed vision-language models are, in principle, promising candidates for developing computational models of these cognitive operations. However, it is currently unclear to what degree these models emulate human behavior on tasks that involve reasoning about data visualizations. This gap reflects limitations in prior work that has evaluated data visualization understanding in artificial systems using measures that differ from those typically used to assess these abilities in humans. Here we evaluated eight vision-language models on six data visualization literacy assessments designed for humans and compared model responses to those of human participants. We found that these models performed worse than human participants on average, and this performance gap persisted even when using relatively lenient criteria to assess model performance. Moreover, while relative performance across items was somewhat correlated between models and humans, all models produced patterns of errors that were reliably distinct from those produced by human participants. Taken together, these findings suggest significant opportunities for further development of artificial systems that might serve as useful models of how humans reason about data visualizations. All code and data needed to reproduce these results are available at: https://osf.io/e25mu/?view_only=399daff5a14d4b16b09473cf19043f18.


Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering

Chen, Zixin, Song, Sicheng, Shum, Kashun, Lin, Yanna, Sheng, Rui, Qu, Huamin

arXiv.org Artificial Intelligence

Misleading chart visualizations, which intentionally manipulate data representations to support specific claims, can distort perceptions and lead to incorrect conclusions. Despite decades of research, misleading visualizations remain a widespread and pressing issue. Recent advances in multimodal large language models (MLLMs) have demonstrated strong chart comprehension capabilities, yet no existing work has systematically evaluated their ability to detect and interpret misleading charts. This paper introduces the Misleading Chart Question Answering (Misleading ChartQA) Benchmark, a large-scale multimodal dataset designed to assess MLLMs in identifying and reasoning about misleading charts. It contains over 3,000 curated examples, covering 21 types of misleaders and 10 chart types. Each example includes standardized chart code, CSV data, and multiple-choice questions with labeled explanations, validated through multi-round MLLM checks and exhausted expert human review. We benchmark 16 state-of-the-art MLLMs on our dataset, revealing their limitations in identifying visually deceptive practices. We also propose a novel pipeline that detects and localizes misleaders, enhancing MLLMs' accuracy in misleading chart interpretation. Our work establishes a foundation for advancing MLLM-driven misleading chart comprehension. We publicly release the sample dataset to support further research in this critical area.


Protecting multimodal large language models against misleading visualizations

Tonglet, Jonathan, Tuytelaars, Tinne, Moens, Marie-Francine, Gurevych, Iryna

arXiv.org Artificial Intelligence

Visualizations play a pivotal role in daily communication in an increasingly data-driven world. Research on multimodal large language models (MLLMs) for automated chart understanding has accelerated massively, with steady improvements on standard benchmarks. However, for MLLMs to be reliable, they must be robust to misleading visualizations, charts that distort the underlying data, leading readers to draw inaccurate conclusions that may support disinformation. Here, we uncover an important vulnerability: MLLM question-answering accuracy on misleading visualizations drops on average to the level of a random baseline. To address this, we introduce the first inference-time methods to improve performance on misleading visualizations, without compromising accuracy on non-misleading ones. The most effective method extracts the underlying data table and uses a text-only LLM to answer the question based on the table. Our findings expose a critical blind spot in current research and establish benchmark results to guide future efforts in reliable MLLMs. Keywords: large language models, chart understanding, visualization In an increasingly data-driven world, visualizations are widely used by scientists, journalists, governments, or companies to efficiently communicate data insights to a broad audience [1]. The correct answer is colored in green, while the wrong answer supported by the misleader is colored in purple. In many cases, visualizations support a message more convincingly than if the underlying data table was shown directly to readers [3].


Preliminary Report: Enhancing Role Differentiation in Conversational HCI Through Chromostereopsis

Grella, Matteo

arXiv.org Artificial Intelligence

We propose leveraging chromostereopsis, Building upon traditional methods that a perceptual phenomenon inducing depth utilize color-coding and textual formatting, perception through color contrast, as a our approach employs a dark terminal novel approach to visually differentiating background to enhance the optical illusion conversational roles in text-based AI interfaces.


Grounded Language Design for Lightweight Diagramming for Formal Methods

Prasad, Siddhartha, Greenman, Ben, Nelson, Tim, Krishnamurthi, Shriram

arXiv.org Artificial Intelligence

Model finding, as embodied by SAT solvers and similar tools, is used widely, both in embedding settings and as a tool in its own right. For instance, tools like Alloy target SAT to enable users to incrementally define, explore, verify, and diagnose sophisticated specifications for a large number of complex systems. These tools critically include a visualizer that lets users graphically explore these generated models. As we show, however, default visualizers, which know nothing about the domain, are unhelpful and even actively violate presentational and cognitive principles. At the other extreme, full-blown visualizations require significant effort as well as knowledge a specifier might not possess; they can also exhibit bad failure modes (including silent failure). Instead, we need a language to capture essential domain information for lightweight diagramming. We ground our language design in both the cognitive science literature on diagrams and on a large number of example custom visualizations. This identifies the key elements of lightweight diagrams. We distill these into a small set of orthogonal primitives. We extend an Alloy-like tool to support these primitives. We evaluate the effectiveness of the produced diagrams, finding them good for reasoning. We then compare this against many other drawing languages and tools to show that this work defines a new niche that is lightweight, effective, and driven by sound principles.